perf(slirp): port-forward listener on epoll (50ms → 180µs accept latency)#72
Closed
perf(slirp): port-forward listener on epoll (50ms → 180µs accept latency)#72
Conversation
Adds two planning docs under docs/superpowers/plans/: - 2026-04-27-smoltcp-passt-port.md (spec) Supersedes the 2026-04-12 network-backend-abstraction design. Replaces "add passt as opt-in backend" with "lift passt's design patterns into our smoltcp stack" — keeps observability, all-Rust path, single binary, cross-platform parity. Lists required skills for execution (rust-style, rustdoc, rust-analyzer-ssr, superpowers TDD/verification, repo verify/profile). Maps the work into 5+1 phases with per-phase plan-doc placeholders. - 2026-04-27-smoltcp-passt-port-phase0.md (Phase 0 plan) 25 bite-sized TDD tasks: correctness baseline pins, divan microbenches, wall-clock e2e harness, NetworkBackend trait extraction, SlirpStack → SmoltcpBackend rename. Includes three BROKEN_ON_PURPOSE assertions that flip in later phases.
Add two baseline tests for the smoltcp DNS proxy: - dns_query_resolves: sends a query for example.com, polls ≤20×100ms, asserts reply XID matches. - dns_cache_keys_by_question_not_xid: warms cache with xid=1, then queries with xid=2 and asserts the stack rewrites the reply XID. Both tests skip gracefully (eprintln + early return) when the upstream resolver is unreachable, making them safe in offline CI. Also adds QNAME_EXAMPLE_COM const and two module-scope helpers: build_dns_query (builds a correct UDP DNS frame with proper payload_len) and parse_dns_reply_xid. SLIRP_DNS_IP added to the existing module-scope slirp import.
Implement measure_tcp_throughput_g2h: binds a host-side TCP listener, boots a VM, execs dd|nc in the guest, drains to EOF on the host, and computes Mbps from bytes_received / elapsed. h2g left None with a TODO.
Implements measure_rr_latency and measure_crr_latency in voidbox-network-bench, reusing the single shared VM booted for throughput measurements. RR: guest pipes N bytes over one persistent nc connection; host times each read+write pair (first sample discarded to absorb connect jitter). CRR: guest runs N independent nc invocations; host times each full accept+read+write+close cycle. Both use the existing percentile() helper (dead_code attribute removed). Latency measurements always run regardless of --no-throughput.
Per user feedback: "Slirp" denotes the user-mode-NAT role; "smoltcp" is the underlying library. Role-based naming keeps the public type surface stable across library swaps and matches the symmetry of future TapBackend / VhostNetBackend siblings. Module file src/network/slirp.rs keeps its name (already aligned with the new type, matches src/devices/virtio_net.rs convention).
The actual polling logic now lives in drain_to_guest, which writes
directly into the caller-supplied &mut Vec<Vec<u8>> buffer — no fresh
allocation on every tick. poll becomes a #[deprecated] shim:
#[deprecated(note = "use drain_to_guest")]
pub fn poll(&mut self) -> Vec<Vec<u8>> {
let mut out = Vec::new();
self.drain_to_guest(&mut out);
out
}
Existing call sites (virtio_net.rs, tests/network_baseline.rs,
benches/network.rs) are annotated with #[allow(deprecated)] and a
TODO(0D.4/0D.5) marker. They will be migrated in the next two tasks,
after which the allow attributes can be removed.
Switch VirtioNetDevice::slirp from Arc<Mutex<SlirpStack>> to Arc<Mutex<dyn NetworkBackend>>, replacing the deprecated poll() call in get_rx_frames with drain_to_guest into a reused rx_scratch buffer. Update both VMM cold-boot and snapshot-restore construction sites to coerce Arc<Mutex<SlirpStack>> to the trait object. All 14 baseline tests pass; fmt and clippy clean.
Type rename only — the slirp.rs module file keeps its name. SlirpBackend reflects the user-mode-NAT role rather than the underlying smoltcp library, keeping naming symmetric with future TapBackend / VhostNetBackend siblings.
Introduces the types and helper needed for ICMP echo NAT (Phase 1):
- IcmpEchoKey {guest_id, dst_ip}: hash key for the echo NAT table.
- IcmpEchoEntry {sock, guest_id, last_activity}: per-request state.
- open_icmp_socket(): opens SOCK_DGRAM/IPPROTO_ICMP (no CAP_NET_RAW).
- icmp_echo: HashMap<IcmpEchoKey, IcmpEchoEntry> field on SlirpBackend,
initialized to HashMap::new() in with_security() (the canonical ctor;
new() and Default both delegate through it).
No behavior change — handle_ipv4_frame is untouched, the map stays
empty. Dead-code allowances are scoped to the new items and will be
removed once tasks 1.2/1.3 wire them in.
drain_to_guest polls the EpollDispatch with a zero-duration timeout and
passes the resulting readiness set to the three relay methods
(relay_tcp_nat_data, relay_icmp_echo, relay_udp_flows).
Each relay now filters by protocol tag (PROTO_TAG_{TCP,UDP,ICMP}) and
only visits flows whose socket appears as EPOLLIN-ready in the event
set, avoiding O(flow_count) reads-on-every-tick.
relay_tcp_nat_data uses a two-pass design: Pass 1 sweeps all TCP entries
for Closed state and idle timeout unconditionally (so a guest FIN that
marks an entry Closed in handle_tcp_frame causes the host TcpStream to
drop promptly, giving the server-side reader an EOF); Pass 2 restricts
the peek/relay I/O to ready entries only.
epoll_arc() added to NetworkBackend trait (Linux cfg-gated, default
None) and overridden on SlirpBackend. VirtioNetDevice.epoll_arc()
delegates to the backend, enabling net_poll_thread (Task 11) to obtain
the shared Arc without an additional lock or refactor.
All 18 baseline pins pass.
Replace the fixed 5 ms sleep with a blocking epoll_wait(50 ms) on the EpollDispatch instance obtained from the network backend. The thread wakes immediately when any registered host socket becomes readable (relay loop runs at event time, not after a fixed delay) and falls back to a 50 ms housekeeping tick when idle — preserving the UDP/ ICMP stale-flow reap path that was previously driven by the 5 ms sleep. If the backend does not expose an epoll instance (non-SlirpBackend, e.g. unit-test mocks), the thread keeps the original 5 ms sleep fallback. All 18 baseline pins pass. Release build clean.
`epoll_fd` is a Linux kernel handle that does not survive snapshot: after `MicroVm::from_snapshot` creates a fresh `SlirpBackend` via `SlirpBackend::new()`, the new `EpollDispatch` starts with zero registered FDs. The current snapshot path does not reconstruct `flow_table` — the backend always starts empty and new flows form naturally — so the rebuild is a no-op today. It is wired in advance so Phase 6.1's half-close work (which will persist restored flows across snapshot/restore) has a ready call site. Changes: - `EpollDispatch`: add `registered_count` field maintained by `register`/`unregister`; expose `registered_fd_count()` under `cfg(any(test, feature = "bench-helpers"))`. - `SlirpBackend::rebuild_epoll_from_flow_table()`: iterates `flow_table` and re-registers each live host FD (`host_stream`, `sock` for UDP/ICMP) with the current dispatcher. - `SlirpBackend::registered_fd_count()`: test/bench shim that delegates to `EpollDispatch::registered_fd_count()`. - `SlirpBackend::reset_epoll_for_snapshot_test()`: replaces the epoll dispatcher with a fresh empty one, simulating the post-snapshot state (kernel handle gone) for unit-level smoke tests. - `epoll_set_rebuilt_from_flow_table_smoke` in `network_baseline`: insert flow → reset epoll → assert count 0 → rebuild → assert count 1.
The smoke test consumes #[cfg(any(test, feature = "bench-helpers"))]- gated helpers (insert_synthetic_synsent_entry, reset_epoll_for_snapshot_test, registered_fd_count). Integration tests in tests/ don't get cfg(test) on the void-box library crate — they only see #[cfg(feature = "bench-helpers")] items when the feature is enabled. Without this gate, default `cargo test --test network_baseline` fails to compile with E0599 on the four helper methods. Now: - Default cargo test → 18 pins pass, smoke test invisible. - cargo test --features bench-helpers -- --test-threads=1 → 19 pins pass, smoke test included. The serial-run requirement is to side-step a pre-existing parallel-run flake in tcp_port_forward_inbound_connect_succeeds (host port-bind contention; not a Phase 6.4 regression).
Divan microbench (`tcp_rx_latency_one_packet`) measures the SLIRP-layer per-packet dispatch cost when one TCP flow is Established and the host kernel has data ready: one zero-timeout epoll_wait + readiness scan + peek + Ethernet frame construction. Measured median on this host: ~9.8 µs per drain_to_guest call. Pre-6.4 the relay iterated every flow in flow_table unconditionally regardless of readiness. Post-6.4 it dispatches only the flows with an epoll EPOLLIN event, reducing wasted work on idle flows to zero. This bench is the regression anchor for that change. The bench is gated on `--features bench-helpers` (like the existing `tcp_inbound_syn_ack_transition` and `synthesize_inbound_syn` benches). It performs a full 3-way handshake outside the timed loop so only the hot relay path is measured. Note: this bench cannot exercise the net_poll_thread 50 ms epoll cycle (that thread does not run inside divan). The wall-clock host→guest latency floor is the province of voidbox-network-bench's `tcp_rx_latency_us_p50` field. That field is added to the Report struct in this commit but returns None (deferred): wiring a guest-side listener requires either a guest daemon or an additional exec RPC — both out of scope for Phase 6.4. The divan microbench is the primary numerical deliverable for this phase.
net_poll_thread holds the EpollDispatch mutex for the full 50 ms of its blocking wait. drain_to_guest's own non-blocking wait_with_timeout(ZERO) call contended on the same mutex, serializing the vCPU thread behind the net-poll thread. voidbox-network-bench saw TCP g2h throughput drop from ~1885 Mbps to ~44 Mbps (40× regression). Fix: SlirpBackend gets a small Mutex<Vec<EpollEvent>> queue. net_poll_thread pushes events into it after each successful wait_with_timeout. drain_to_guest drains the queue (brief uncontended lock) without touching EpollDispatch. A try_lock fallback path serves unit tests (no net_poll_thread) without blocking on the mutex. NetworkBackend trait gains a push_ready_events default-no-op so SlirpBackend can override it; VirtioNetDevice exposes push_events_to_backend as the trampoline called by net_poll_thread. Off-CPU profile evidence: drain_to_guest was 9% off-CPU (29.7s in a 60s window) waiting on the epoll mutex; should drop to near-zero post-fix.
relay_tcp_nat_data's Pass 1 unconditionally copied every TCP FlowKey into a Vec to scan for Closed entries on every drain call. Cache misses 47/1K under load; poll_with_n_flows/100 regressed +246% (130ns → 450ns), /1000 regressed +220%. Fix: when handle_tcp_frame's FIN/RST handlers and mid-function error paths set state=Closed, push the key onto a pending_close Vec. relay_tcp_nat_data drains this Vec at the top of its single ready-events pass — no O(n) collect required. Idle-timeout detection retains a direct flow_table iteration but without allocating a separate key Vec.
…_guest The initial Bug A fix used pending_events.lock() + try_lock(epoll) in drain_to_guest's fast path, adding ~150ns overhead per call vs Phase 6.4 (one extra Mutex acquire). This showed as +38% regression in poll_idle bench (441ns → 611ns). Revised approach: try_lock epoll first (zero cost when uncontended — tests, benches, idle net-poll thread). On Err (net_poll_thread holds the mutex for 50ms), drain pending_events instead. In production the try_lock fails ~once per 50ms window; in tests it always succeeds. Net result: drain_to_guest overhead matches Phase 6.4 when epoll is uncontended; contention eliminated when net_poll_thread is actively waiting.
Pre-Phase-6.4, net_poll_thread woke unconditionally every 5 ms, so
every ACK queued in inject_to_guest by handle_tcp_frame got flushed
within 5 ms. Phase 6.4's epoll_wait(50 ms) waits for FD readiness
events — but a guest writing data has no FD-side signal (the guest
is the writer; the SLIRP-side socket only becomes readable when the
host responds). So queued ACKs sat 50 ms before being flushed; TCP
send window stalled; voidbox-network-bench TCP g2h dropped from
~1885 Mbps to ~225 Mbps even after the mutex-contention fix.
Fix: track inject_to_guest length around process_guest_frame's
ethertype dispatch. If the call queued any frames, call
epoll_waker.wake() — one byte to the non-blocking self-pipe, which
unblocks net_poll_thread's epoll_wait so the queued frames flush
within microseconds.
Also fixes the related drain_to_guest event-source ordering bug:
pending_events (filled by net_poll_thread) is now ALWAYS drained
first, with the non-blocking epoll poll only running as a fallback
when the queue is empty (test/bench paths without net_poll_thread).
The previous code took the try_lock branch when net-poll was
between iterations and silently dropped events the net-poll
thread had already pushed.
voidbox-network-bench post-fix:
g2h: ~6000 Mbps (vs master 1885; +3.2x)
bulk-g2h: ~3900 Mbps (vs master 1565; +2.5x at SO_RCVBUF=4096)
rr p50: 2 us (parity with master)
crr p50: ~50 ms (5x regression vs master ~10 ms — separate
bug, tracked in follow-up; the 50 ms is
exactly one epoll_wait cycle and points
to a connection-establishment latency
issue independent of the throughput path)
CRR p50 was regressing +40 ms (10 ms → 51 ms) post-Phase-6.4. The +40 ms exactly matches Linux's TCP delayed-ACK timer, and the cause is that Phase 6.4 widened the net-poll IRQ re-pulse cadence from 5 ms to 50 ms. The Linux guest spends most idle time in HLT and relies on regular vCPU scheduling slots — driven by our IRQ pulses — to advance its TCP delayed-ACK timer. At 50 ms cadence the guest's pure ACKs ride the next event-triggered IRQ, which can be 40+ ms away. At 5 ms the housekeeping cadence mirrors pre-6.4 and the timer fires on schedule. We lose Phase 6.4's headline "10x idle-wakeup reduction" goal but fast-path events still wake immediately via epoll readiness — so the net win vs master is unchanged: g2h throughput +250%, bulk throughput +250%, RR parity, CRR parity. voidbox-network-bench post-fix: g2h: ~6500 Mbps (vs master 1885; +247%) bulk-g2h: ~5400 Mbps (vs master 1565; +245%) rr p50: ~3 us (parity) crr p50: ~10100 us (parity — back to baseline 10 ms)
Per AGENTS.md doc-comment style ("avoid ticket IDs and PR/commit
references inside doc comments and inline comments — they belong
in commit messages and PR descriptions where they're audit trail;
in code they age into noise as the ticketing context evolves").
Phase references fall into the same category. Comments are
rewritten in present tense to explain the structural reasoning
without referencing when each piece landed. Identifiers like
test names and BROKEN_ON_PURPOSE markers are unchanged.
Plan/spec docs in docs/superpowers/plans/ are intentionally
untouched — phase references there ARE the audit trail.
Recovers Phase 6.4's headline 10x idle-wakeup reduction without
re-introducing the +40 ms CRR regression that forced the cadence
back to a fixed 5 ms.
The adaptive policy:
- last cycle had any kernel event → next timeout 5 ms (active)
- last cycle timed out (no events) → next timeout 50 ms (idle)
A single quiet cycle drops us to idle; a single event puts us back
in active in the next cycle.
The subtlety that motivated the additional EpollDispatch change:
when the vCPU thread calls epoll_waker.wake() during a 50 ms idle
wait, the kernel's epoll_wait returns with the self-pipe event.
wait_with_timeout filters that event out and drains the pipe — so
`epoll_events.is_empty()` would have remained true, and the naive
"is_empty ⇒ idle" predicate kept us at 50 ms forever, regressing
CRR p50 back to ~50 ms.
wait_with_timeout now returns the *raw* kernel count (including
self-pipe wakes) so the adaptive policy treats wakes as activity.
Filtered events still arrive in the out parameter unchanged; only
the return value's meaning shifted from "observable count" to
"raw count," which all existing callers ignore.
voidbox-network-bench post-fix:
g2h: ~6680 Mbps (vs 5 ms fixed: 6500; vs master: +254%)
bulk-g2h: ~5550 Mbps (vs 5 ms fixed: 5400; vs master: +254%)
rr p50: 1 us (in 99-sample iteration; parity)
crr p50: ~10100 us (parity preserved — adaptive correctly
holds 5 ms cadence during connection
bursts because each connection's wake()
keeps raw_kernel_events > 0)
Idle CPU dropped: profile-pre showed net_poll_thread on-CPU 4.93 %
of total at fixed 5 ms cadence (200 wakes/sec); adaptive should
drop to ~10x lower during idle stretches between iterations.
flow_token_for_tcp/udp truncated dst_ip to 16 bits; flow_token_for_icmp omitted dst_ip entirely. Multiple flows could collide on the same token, mis-routing readiness events to the wrong FlowKey. Replace the lossy encoding with a monotonic AtomicU64 counter per backend. Tokens are still tagged in the high byte for protocol demux (PROTO_TAG_TCP/UDP/ICMP); the lower 56 bits are unique. A new token_to_key HashMap makes readiness → FlowKey lookup O(1) instead of the previous linear flow_table scan.
net_poll_thread held the Mutex<EpollDispatch> across the blocking epoll_wait call (up to 50 ms in idle cadence). vCPU register/ unregister paths in handle_tcp_frame (and friends) had to acquire the same mutex and would block behind the wait, stalling guest TCP SYN handling for up to 50 ms during connection setup. epoll_ctl and epoll_wait are kernel-thread-safe on the same epoll fd; the only state requiring synchronization was the self-pipe (now eagerly initialized in EpollDispatch::new) and the registered fd count (now AtomicUsize). EpollDispatch becomes Sync without an external Mutex — the type changes from Arc<Mutex<EpollDispatch>> to Arc<EpollDispatch>. register/unregister run lock-free against the wait thread; only the kernel's per-epoll-fd internal lock serializes, and that's a fast path.
to_remove.contains() inside the idle-timeout loop was O(n*k) under churn. Switch the membership check to a HashSet<FlowKey> and only materialize the Vec once at the end for the removal loop.
Apply project rust-style rules to the recently-landed Phase 6.4
code (token rewrite + lock-free EpollDispatch refactor):
1. RegisterMode enum replaces (readable: bool, writable: bool) on
EpollDispatch::register. Closed-set policy at the call site
(Read / Write / ReadWrite) over two opaque booleans.
2. matches!() removed at three sites — the project guide prefers
full match (or boolean ==) for compiler diagnostics if the
matched type changes. The unprivileged-ICMP errno check now
uses == comparisons; FlowKey::Tcp counter uses a for loop.
3. Iterator chains in relay loops rewritten as for loops with
mutable accumulators per the project rule. relay_tcp_nat_data,
relay_udp_flows, relay_icmp_echo all touched. Logic unchanged;
control flow now reads top-down without a chain of
.filter().filter_map().collect().
4. Local renamed `rc` → `epoll_ctl_result` at three EpollDispatch
sites. Role-bearing names are required in non-tiny scopes.
5. Dropped redundant explanatory comments around the relay loops
("Data relay — only for flows with…", "Skip entries already
queued for…", "Collect ready ICMP flow keys via…"). The code
below them is self-describing. Kept structural "why" comments
(the ICMP idle-sweep rationale, the per-flow socket Drop
contract).
No behavior change. cargo fmt, clippy -D warnings, network_baseline
(18/18), lib network (23/23), and voidbox-network-bench wall-clock
(g2h ~6580 Mbps, CRR ~32 µs) all green.
Each TCP port-forward rule used to burn one thread that polled TcpListener::accept() with 50 ms sleeps between WouldBlock returns. That thread was both the accept-latency floor and the only piece of networking still polling on a fixed cadence after Phase 6.4. Listener FDs are exactly what EpollDispatch handles. Bind + register the listener under PROTO_TAG_LISTEN at construction time; the net-poll thread sees readiness, accepts the connection (drains WouldBlock), and feeds the existing InboundAccept channel. No dedicated thread per rule. Drops port_forward_accept_latency from ~50 ms to sub-millisecond (divan microbench is host TcpStream::connect → first frame in inject_to_guest, now bounded by epoll_wait latency in the active 5 ms cadence rather than a fixed 50 ms poll). Removes: - run_port_forward_listener (thread main loop) - spawn_port_forward_listeners (now bind_port_forward_listeners) - PORT_FORWARD_POLL_INTERVAL - port_forward_shutdown Arc<AtomicBool> - the Drop impl block that joined listener threads
0c4bb26 to
e51e409
Compare
There was a problem hiding this comment.
Pull request overview
This PR removes per-port port-forward accept threads (with a 50 ms polling interval) and moves inbound TCP port-forward listeners onto the existing EpollDispatch readiness loop, reducing accept latency to the epoll cadence and aligning the last polling networking component with the Phase 6.4 architecture.
Changes:
- Register each port-forward
TcpListenerFD withEpollDispatchusing a new listener token tag (PROTO_TAG_LISTEN) and aflow_token_for_listener(host_port)helper. - Replace thread-based listener spawning with
bind_port_forward_listeners+ a newprocess_listener_readinessaccept/drain loop that runs on the net-poll thread. - Update ordering in
drain_to_guestto accept ready listener connections before draining the inbound-accept channel, plus bench/test wording updates.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/network/slirp.rs |
Moves port-forward listener accept onto epoll; removes listener threads/shutdown/join; adds listener token tagging and readiness-driven accept handling. |
benches/network.rs |
Updates the port-forward accept-latency benchmark documentation/comments to reflect epoll-driven listeners and expected latency characteristics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| match listener.accept() { | ||
| Ok((stream, peer_addr)) => { | ||
| let high_port = peer_addr.port(); | ||
| let _ = stream.set_nonblocking(true); |
Comment on lines
+766
to
+770
| sender_failed = true; | ||
| break; | ||
| } | ||
| } | ||
| let _ = sender_failed; // receiver drop handled gracefully on next tick |
This was referenced May 5, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: DRAFT. Stacked on top of PR #69 (Phase 6.4 epoll dispatch).
What this branch does
Each TCP port-forward rule used to spawn a dedicated thread that polled
TcpListener::accept()with 50 ms sleeps between WouldBlock returns. That thread was both the accept-latency floor and the only piece of networking still polling on a fixed cadence after Phase 6.4. This PR moves listener FDs onto the sameEpollDispatcheverything else already uses.Headline number
port_forward_accept_latency(divan)The new median is bounded by the net-poll thread's epoll_wait latency in the active 5 ms cadence, not by a fixed polling interval.
Production wall-clock (unchanged — verifies no regression)
voidbox-network-bench --iterations 3 --bulk-mb 10:What changed
PROTO_TAG_LISTEN = 0x0400_…tag +flow_token_for_listener(host_port)helper.SlirpBackend::port_forward_listenersfield changes fromVec<JoinHandle<()>>toHashMap<u16, (TcpListener, u16)>. Listeners live in the struct itself.spawn_port_forward_listeners(which spawned threads) replaced bybind_port_forward_listeners(binds + sets non-blocking + registers with epoll). No threads.process_listener_readiness(&[EpollEvent])iterates ready listener FDs, drains each (multiple connections may share one EPOLLIN edge), pushes accepted streams through the existingInboundAcceptchannel.drain_to_guestcallsprocess_listener_readinessBEFOREprocess_pending_inbound_acceptsso newly-accepted connections land in the same tick.Dropimpl no longer joins listener threads (no threads to join). TheHashMapofTcpListeners drops naturally; epoll registrations are released whenEpollDispatchis dropped.Removed
run_port_forward_listener(thread main loop)spawn_port_forward_listenersPORT_FORWARD_POLL_INTERVALport_forward_shutdown: Arc<AtomicBool>Tests
tcp_port_forward_inbound_connect_succeedse2e pin still passes (the contract is correctness, not latency).with_security_spawns_listener_per_tcp_port_forwardrenamed towith_security_binds_listener_per_tcp_port_forward— same semantics (port_forward_listeners.len()now counts bound listeners, was thread handles).port_forward_accept_latencybench doc-comment updated; the 50 ms ceiling caveat is gone.Validation
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features -- -D warningscargo test --test network_baselinecargo test --test network_baseline --features bench-helpers -- --test-threads=1cargo test --lib networkcargo bench --bench network --features bench-helpers --no-runcargo build --releasevoidbox-network-bench --iterations 3 --bulk-mb 10Test plan for review
cargo bench --bench network --features bench-helpers port_forward_accept_latencyreports a sub-millisecond median (was 50 ms in PR Phase 6.4: epoll-driven RX dispatch with adaptive timeout (3.5× throughput vs main) #69).cargo test --test network_baseline tcp_port_forward_inbound_connect_succeedspasses.PORT_FORWARD_POLL_INTERVALorport_forward_shutdownanywhere insrc/.